Validation of actigraphy sleep metrics in children aged 8 to 16 years: considerations for device type, placement and algorithms

Background Actigraphy is often used to measure sleep in pediatric populations, despite little confirmatory evidence of the accuracy of existing sleep/wake algorithms. The aim of this study was to determine the performance of 11 sleep algorithms in relation to overnight polysomnography in children and adolescents. Methods One hundred thirty-seven participants aged 8–16 years wore two Actigraph wGT3X-BT (wrist, waist) and three Axivity AX3 (wrist, back, thigh) accelerometers over 24-h. Gold standard measures of sleep were obtained using polysomnography (PSG; Embletta MPRPG, ST + Proxy and TX Proxy) in the home environment, overnight. Epoch by epoch comparisons of the Sadeh (two algorithms), Cole-Kripke (three algorithms), Tudor-Locke (four algorithms), Count-Scaled (CS), and HDCZA algorithms were undertaken. Mean differences from PSG values were calculated for various sleep outcomes. Results Overall, sensitivities were high (mean ± SD: 91.8%, ± 5.6%) and specificities moderate (63.8% ± 13.8%), with the HDCZA algorithm performing the best overall in terms of specificity (87.5% ± 1.3%) and accuracy (86.4% ± 0.9%). Sleep outcome measures were more accurately measured by devices worn at the wrist than the hip, thigh or lower back, with the exception of sleep efficiency where the reverse was true. The CS algorithm provided consistently accurate measures of sleep onset: the mean (95%CI) difference at the wrist with Axivity was 2 min (-6; -14,) and the offset was 10 min (5, -19). Several algorithms provided accurate measures of sleep quantity at the wrist, showing differences with PSG of just 1–18 min a night for sleep period time and 5–22 min for total sleep time. Accuracy was generally higher for sleep efficiency than for frequency of night wakings or wake after sleep onset. The CS algorithm was more accurate at assessing sleep period time, with narrower 95% limits of agreement compared to the HDCZA (CS:-165 to 172 min; HDCZA: -212 to 250 min). Conclusion Although the performance of existing count-based sleep algorithms varies markedly, wrist-worn devices provide more accurate measures of most sleep measures compared to other sites. Overall, the HDZCA algorithm showed the greatest accuracy, although the most appropriate algorithm depends on the sleep measure of focus. Supplementary Information The online version contains supplementary material available at 10.1186/s12966-024-01590-x.


Background
A large body of evidence has emerged implicating characteristics of children's sleep such as short quantity, timing, poor quality, and high variability with a wide range of adverse health outcomes [1].However, the majority of studies rely on retrospective self-or parentreports of sleep, which may be unreliable and sensitive to recall bias [2,3].Although polysomnography (PSG) is considered the gold-standard measure of sleep, it is obtrusive and impractical for large-scale studies.Thus, actigraphy is increasingly being used as a practical and suitable method to objectively measure sleep, particularly over longer time frames than is possible with PSG.To estimate sleep outcomes, actigraphy data are analysed using algorithms to classify sleep and wake based on the assumption that the presence of movement indicates wakefulness and the absence of movement indicates sleep.Typically, algorithms vary by the population studied, device worn and the placement site they were developed for (i.e.wrist, ankle, waist), but most work in a similar fashion: to define each minute of recorded activity as either sleep or wake.
However, there are several issues with these existing algorithms.First, although various algorithms have been developed [4][5][6][7][8][9], few [7,10] have been validated against the gold standard PSG in paediatric populations, with the remainder using sleep diaries or visual inspection.Second, choice of algorithm influences sleep-wake time estimates suggesting that sleep variables derived from different algorithms might not be comparable [11].Third, although currently available sleep algorithms provide reasonable estimates of sleep, most require participants to record their sleep onset and waking times, which are used to guide the algorithm to detect nocturnal sustained bouts of inactivity.However, sleep diaries are often inaccurate, add to participant burden, and are time consuming for researchers in large scale studies [12].To overcome these limitations, fully automated algorithms that do not require diaries have been developed for use in children which automatically score sleep [5][6][7][8][9] but evidence of their accuracy against PSG is limited [10,13].
With the growing availability of accelerometry data from large studies, often without sleep diaries, it is necessary to establish whether sleep outcomes are comparable between brands and across various wear sites.It is also important to evaluate sleep outcome estimates between the most widely used sleep-wake algorithms, with and without the use of sleep diaries to guide the algorithm.Therefore, the aim of this study is to compare the accuracy of the most widely used sleep algorithms against overnight PSG in children and adolescents.

Participants
Children and adolescents were recruited via social media (i.e.Facebook), schools, and word of mouth.Children aged 8 to 16 years at the time of recruitment with no history of sleep disturbance (see below) were eligible for the study.Ethical approval was obtained from the University of Otago Human Ethics Committee (ref H18/073).

Data collection overview
During a visit to each participant's home height and weight were measured and five accelerometers were attached to the child (two on the wrist, one around their waist, one on their lower back, and one on their upper thigh).These devices were worn for one 24-h period.Participants were also fitted with a portable polysomnography (PSG) machine one hour before bedtime to measure sleep during the overnight period in the home environment.Children were asked to complete a basic activity log the next day.The same computer was used to program the accelerometers and the PSG recording device and times were synchronized.

Sleep Disturbances Scale for Children (SDSC)
Parents completed the SDSC consisting of 27 items assessing sleep behaviour and disturbances in children in the previous six months [14].A total sleep problem score is derived from six sleep disturbance factors.A score greater than 39 is indicative of a clinical disturbance and those identified as having a sleep disorder, or those with any chronic medical condition or physical disability that impeded their ability to participate in physical activity, were excluded.

Demographic and anthropometric data
Information was collected on participant's age, sex, date of birth, and ethnicity using New Zealand census questions [15].Their address was used to determine area based socio-economic status using the New Zealand Deprivation Index (NZDep Index, 2018) [16].Duplicate measures of height (Model 213, Seca, Germany) and weight (Tanita HD-351) were obtained by trained research assistants.An additional measure was undertaken if duplicate measures of height differed by more than 0.5 cm and if weight differed by more than 0.5 kg.Body mass index (BMI) was calculated as weight (kg) / height (m) 2 , with overweight and obesity defined as a BMI z-score ≥ 85 th but < 95 th and ≥ 95 th percentiles, respectively, using the WHO growth reference [17].

Home-based polysomnography
A home-based, PSG sleep study was conducted where overnight PSG data were recorded using a digital portable monitor (Embletta MPRPG, ST + Proxy and TX Proxy, Natus, California, USA) within participant's homes at a sampling rate of 500 Hz following American Academy of Sleep Medicine guidelines [18].The researcher began the PSG set up approximately one hour before bedtime.The PSG included right and left electro-oculograms (EOG), four electroencephalograms (EEG) (C4/M1, C3/ M2, O2/M1, O1/M2), chin electromyogram, nasal airflow, snoring, thoracic and abdominal respiratory effort (Xact Trace Respiratory Effort Sensor) and ECG.Oxygen saturation was measured with pulse oximetry.Data were downloaded and analysed using RemLogic software (Version 3.4, Embla Systems, Broomfield, CO, USA).Low frequency filters were set at 0.3 Hz and high frequency at 35 Hz for EEG signals.Sleep stages were scored visually by one trained sleep technician in 30 s epochs using the American Academy of Sleep Medicine (AASM) sleep staging criteria [18] for children.To allow for comparison to actigraphy, the PSG epoch lengths were collapsed into one-minute epochs.In doing so, if either 30-s epoch within the minute was scored as wake, then we considered that whole minute as wake.For PSG, sleep onset was the first epoch of sleep after lights out.Total sleep time (TST) was defined as the number of minutes from sleep onset to sleep offset minus the number of minutes awake.Wake after sleep onset (WASO) represented the duration of time spent awake after initially falling asleep, while sleep efficiency (SE) was defined as follows: 1) Sleep efficiency TIB , a commonly referenced metric, calculated as the ratio of total sleep time (TST) to time spent in bed (TIB); and 2) Sleep efficiency SPT , determined by expressing total sleep time (from sleep onset to offset, minus any WASO) as a percentage of sleep period time (from sleep onset to offset, inclusive of any WASO).We chose to use the Sleep Period Time (SPT) in our definition of Sleep efficiency SPT alongside the more traditional definition which uses TIB because one of our aims was to compare the accuracy of algorithms that required sleep diaries versus those that did not.Furthermore, the definition of SE that uses TIB, by definition, includes non-sleep related activity (eg reading, texting, mobile phone use) both prior to initiating sleep and after the final awakening, which do not reflect the construct of SE where TST is compared to the amount of time spent attempting to initially fall asleep and sleep discontinuity.Number of awakenings was the number of overnight awakenings between sleep onset to offset.The PSG and actigraphy data were analysed independently by different researchers.
Both accelerometers are triaxial and were configured to record at a frequency of 100 Hz and initialised using the same personal computer as the PSG.The compact size (32.5 × 23 × 8.9 mm), lightweight design (11 g), and waterproof feature of the Axivity AX3s contribute to higher compliance among children, while the inclusion of a temperature sensor assists in non-wear detection.The Actigraph wGT3X-BT is currently the most widely used research-grade device and is larger (46 × 33 x 15 mm, 19 g) than the Axivity AX3 and lacks a temperature sensor.The three Axivity accelerometers were fitted to the right side of the lower back (waist-level), middle of the right thigh, and non-dominant wrist using custom designed hypoallergenic tape.Two Actigraph accelerometers were fitted to participants at two main sites: the non-dominant wrist using an elastic wrist strap and over the right hip using custom designed hypoallergenic tape.Axivity devices were set up and data downloaded with OmGui software version 1.0.0.30(Open Movement, Newcastle, UK).ActiGraph wGT3X-BT devices were initialised and downloaded using ActiLife version 6.13.3, saved in raw format as.gt3x, then converted for data processing.Raw acceleration data from the Actigraph and Axivity were processed and calibrated using the openaccess Pampro package v0.5 [19] and converted into hdf5 file formats for processing.All algorithms except the HDCZA were written in the Python programming language (Python Software Foundation, https:// www.python.org/) and outputs were computed using this same software system, rather than proprietary device software.Data analysed using the HDCZA algorithm were processed and analysed with R-package GGIR version 1.2-0 (http:// cran.r-project.org) [20].

Algorithms
The selection of algorithms featured in this manuscript was informed by a comprehensive review of pertinent literature pertaining to prevalent methodologies utilized for estimating sleep patterns in pediatric populations employing count-based actigraphy.Additionally, consideration was given to algorithms integrated within the proprietary software accompanying the Actigraph GT3X + devices.Details of how each algorithm scores sleep and wake and calculates each sleep outcome are given in Table 1.Briefly, we included three versions of the Cole-Kripke algorithm [5], two versions of the Sadeh algorithm [13], four versions of the Tudor-Locke algorithm [4,8], the count-scaled (CS) algorithm [6], and the HDCZA algorithm [9].In general, the versions of each algorithm differed mostly by whether they required the use of diaries to estimate sleep onset and offset and whether they included variations to account for changes Sleep episodes that were separated by at least 20 min of inclinometer re-scored wake minutes were distinct within the sleep period time and were not combined.Total sleep episode time (TSET) represented the total minutes from all sleep episodes; in cases where there was a single sleep episode, or all sleep episodes were separated by less than 20 min of inclinometer re-scored wake, the TSET was equal to RSA sleep period time.Nonwear was identified when 90 consecutive minutes of 0 activity counts were encountered while allowing for up to 2 min of nonzero activity counts.The nonwear period ended when a third minute of nonzero activity counts was detected.If at least 90% of a sleep episode was categorized as nonwear, then all minutes within that sleep episode were redefined as nonwear and not included in the calculation of TSET.

Table 1 (continued)
Algorithm Sleep/wake epochs scored as 1 or 0 based on

Parameter algorithm used
Tudor-Locke 3 Sadeh algorithm [13] used in Actilife that changes the algorithm sleep wake thresholds based on newer model sensitivities Original Tudor-Locke automated method [8] that rescores the sleep/wake states using inclinometer data.Bedtime is identified as the first 5 consecutive minutes defined as sleep.
Similarly, wake time is identified as the first 10 consecutive minutes defined as wake after a period of sleep.Bedtime and wake time are only identified when at least 160 min has elapsed between these 2 time points.An unlimited number of nonconsecutive wake minutes are allowed between bedtime and wake time, in keeping with the definition of sleep-period time that includes all sleep epochs and wakefulness after onset.Multiple sleep periods (≥ 160 min) are allowed during each 24-h day.The algorithm was constructed to output the beginning and ending minutes for each sleep period identified, but ultimately retains only the beginning minute of the first period (bedtime) in the time block studied and the final minute of the last period (wake time).Sleep-period time is ultimately calculated as the number of minutes between bedtime and wake time.Does not account for the possibility of nocturnal nonwear or extended episodes of wakefulness separating the SPT into multiple sleep episodes.Also does not consider the potential for misclassifying daytime periods of nonwear or other sedentary behaviors as sleep episodes (i.e., "naps").
Tudor-Locke 4 Sadeh algorithm [13] used in Actilife that changes the algorithm sleep wake thresholds based on newer model sensitivities Refined Tudor Locke method [4] rescores sleep/wake states using the inclinometer to identify the probability of sleep and define parameters.Constrains algorithm to nocturnal sleep, by rule that only allows sleep onset between 7:00 p.m. and 5:59 a.m.Sleep offset refined and identified as the first of 10 or 20 consecutive inclinometer revised scored wake minutes, depending on the time of day (10 min-5:00 a.m. to 11:58 a.m.; 20 min-9:40 p.m. to 4:59 a.m.).The RSA allows identification of extended episodes of wakefulness that separate the sleep period time into distinct sleep episodes with multiple sleep onsets and offsets.If two sleep episodes were separated by less than 20 min of inclinometer re-scored wake minutes, then they were combined into a single sleep episode starting with the first minute of the first sleep episode and ending with the final minute of the last sleep episode.Sleep episodes that were separated by at least 20 min of inclinometer re-scored wake minutes were distinct within the sleep period time and were not combined.Total sleep episode time (TSET) represented the total minutes from all sleep episodes; in cases where there was a single sleep episode, or all sleep episodes were separated by less than 20 min of inclinometer re-scored wake, the TSET was equal to RSA sleep period time.Nonwear was identified when 90 consecutive minutes of 0 activity counts were encountered while allowing for up to 2 min of nonzero activity counts.The nonwear period ended when a third minute of nonzero activity counts was detected.If at least 90% of a sleep episode was categorized as nonwear, then all minutes within that sleep episode were redefined as nonwear and not included in the calculation of TSET.

HDCZA
Calculates wrist rotation (changes in the z-angle) for each 5-min rolling window and values under the 10th percentile over an individual day (noon-to-noon).The algorithm then detects blocks lasting > 30 min, with gaps < 60 min counted towards the identified blocks.The longest block in the day between noon-noon represents the sleep period window.Sleep episodes were defined as the sustained periods of inactivity within the sleep period window.From this, the number of sleep episodes within each sleep period window detected is calculated as well as sleep efficiency within the sleep period window calculated as the percentage of time asleep within the sleep period.Note, newer versions of this algorithm can use values under the 13th, 20th and 50th percentile.
in sensitivity between older and newer accelerometer models.

Epoch-by epoch comparison
One-minute epochs from the Axivity thigh, wrist, and lower back and Actigraph waist and wrist were aligned with corresponding PSG epochs.Agreement between the Axivity and Actigraph at each site placement (wrist, thigh, lower back, waist) and PSG (as the gold standard) were examined by calculating overall agreement (%), sensitivity (% sleep agreement), and specificity (% wake agreement).
Sleep outcomes were organised into three categories: sleep timing (sleep onset and offset), sleep quantity (sleep period time and total sleep time), and sleep quality (WASO, sleep efficiency, and number of night wakings).These were described with means and standard deviations and compared to PSG by calculating the mean difference and 95% confidence interval.Only participants with data for all outcomes were included for each device and placement.
Bland Altman plots were used to explore agreement against PSG for the "overall best performing" algorithm, regardless of placement site or device (by % accuracy) and for the "best performing algorithm" (by mean difference from PSG) for the site placement and device deemed to be the best performing for SPT (a measure dependent on sleep onset and sleep offset and not dependent on WASO) and WASO.Mean differences and 95% limits of agreement were calculated.Stata 17.0 (StataCorp, Texas) was used for all analyses.

Study participants
In total, 384 children completed the screening questionnaire.Of these, 202 were ineligible, due to age (n = 4), lived outside the Dunedin area (n = 12) or had a sleep disturbance score greater than 39 (n = 186).A total of 182 participants were eligible to participate and of these 151 expressed further interest in the study.PSG was conducted in 138 participants with early termination of PSG for one participant due to technical failure, leaving 137 participants included in the final analyses (Supplementary Table 1 for details on missing data).The characteristics of the participants are shown in Table 2.The majority of participants were of New Zealand European ethnicity, slightly more boys participated than girls, and 37% of the sample were overweight or obese.

Actigraph vs Axivity at the wrist vs waist, lower back, thigh
Table 3 demonstrates that in general, overall accuracy tended to be higher for both devices placed at the wrist (mostly greater than 80%) than when placed close to the centre of mass (waist, thigh, and lower back, where accuracy was generally less than 80%).However, different patterns were observed for sensitivity and specificity.Sensitivity, or the ability to detect episodes of sleep was generally higher when placed closer to the centre of mass for

Table 2 Characteristics of the study population
Data presented as mean (SD) except where noted a Uses the New Zealand Index of Deprivation 2013, which reflects the extent of material and social deprivation and is used to construct deciles from 1 (least deprived) to 10 (most deprived) [16] b Categories based on the WHO BMI z-score cut-offs [17]   both the Actigraph and Axivity compared to the wrist.By contrast, specificity (% wake agreement) was considerably better for both devices at the wrist than at the waist.

Algorithms vs placement
Site of placement did not appear to affect the overall accuracy or sensitivity for each algorithm to a great extent as most algorithms appeared to perform similarly when placed close to the centre of mass (thigh, lower back, waist) or at the wrist, varying by less than 10% (Table 3).However, site of placement had a large effect on specificity for most algorithms with only the HDCZA algorithm varying by less than 10% between placements.
Regardless of placement, we report similar total accuracy across the HDCZA, CS, Sadeh 1, Sadeh 2, Cole-Kripke 1, Tudor-Locke 3, and Tudor-Locke 4 algorithms, but lower accuracy for the Cole-Kripke 2, Cole-Kripke 3, Tudor-Locke 1, and Tudor-Locke 2 algorithms.Given the difficulty of actigraphy to detect periods of wakefulness during sleep, the considerably higher level of specificity for the HDCZA algorithm (ranging from 85.9% to 89.6%), compared to all others which showed specificities as low as 41.2%, with many less than 60%, should be noted.
A sensitivity analysis (Supplementary Table 2) was undertaken to determine the effect of the post-processing merge of PSG epochs into 60-s.In the original analyses if either 30-s epoch within the minute was scored as wake, we considered that whole minute as wake, whereas in the sensitivity analyses if either 30-s epoch within the minute was scored as sleep, we considered that whole minute as sleep.For most algorithms (apart from a few placed at the wrist) this resulted in marginal increases in accuracy (< 2%) as a result of increases in specificity (the ability to detect wake-time) at the expense of decreases in sensitivity (the ability to detect sleep time).

Sleep outcomes
Tables 4 (Actigraph) and 5 (Axivity) report differences between each algorithm and PSG for relevant sleep outcomes of interest in three broad categories: sleep timing      NA not available a Bolded differences refer to those whether actigraphy was not significantly different (P > 0.05) to PSG, and thus provide a good estimate for that sleep measure

Sleep timing
For sleep onset, almost all algorithms detected a sleep onset significantly earlier than the PSG gold standard, with differences ranging from just 2 min to as much as 149 min for the Actigraph and 1 min to 144 min for the Axivity.Overall, differences in sleep onset were generally smaller for either device when placed at the wrist, with several algorithms providing valid estimates of sleep onset with differences of just 1-15 min compared to PSG (Actigraph hip HDCZA, Actigraph wrist CS, Sadeh 1, Sadeh 2, Tudor-Locke 3, Axivity wrist CS, Sadeh 1, Cole-Krikpe 1, Tudor-Locke 3).In terms of sleep offset, differences were smaller for Actigraphs placed at the wrist than those at the hip, with all algorithms except for Tudor-Locke 2 showing small differences compared with PSG.In general, differences for the Axivity placed at the wrist were smaller than those placed at the thigh or back.However, overall, it can be seen that the Axivity placed on the thigh and to a lesser extent on the back, perform better than Actigraph at the hip, with 8 and 4 of 11 algorithms respectively reporting only small, non-significant differences compared to PSG, whereas just one algorithm (HDCZA) produced small differences with the Actigraph placed at the hip.

Sleep quantity
Tables 4 and 5 demonstrate that many of the algorithms show large differences compared with PSG, in some cases overestimating sleep by more than two hours whether measured as sleep period time or total sleep time.However, there was a clear pattern of wrist placement providing substantially more accurate estimates of sleep quantity, for both devices.For example, differences (95% CI) for the Actigraph at the wrist ranged from 1 (-12, 15) to 54 (27, 81) minutes for sleep period time, whereas the corresponding values for hip placement were up to 243 (203, 283) minutes different.A similar pattern is shown for the Axivity (Table 5).While several algorithms performed well only a few (Sadeh 1, Cole-Kripke 1, HDCZA), consistently performed well for both devices and placement sites and only the count-scaled algorithm showed a difference with PSG of less than 30 min for all eight measures examined (total sleep time and sleep period time at both wrist and hip for both devices).

Sleep quality
In terms of WASO, examination of Tables 4 and 5 demonstrate that actigraphy produces lower values for WASO compared with PSG for almost all sites, devices and algorithms tested.However, in general, estimates more closely matched PSG values when the device was placed at the wrist, particularly for the Actigraph, with 7 of the 11 algorithms showing small differences (differences ranging from just 5 to 22 min for these algorithms).On the other hand, better estimates of sleep efficiency were obtained from devices placed on the hip (Actigraph), thigh or back (Axivity).Regardless of device or placement, the algorithms tested resulted in small differences in sleep efficiency compared to PSG.Overall, sleep efficiency defined using TIB was lower than sleep efficiency defined using SPT and resulted in larger differences compared to PSG.Lastly, estimates of the number of night wakings differed considerably from PSG measures for most of the algorithms examined.Only 1 of the 20 Actigraph (Cole-Kripke 1 at the wrist) and 2 of the 30 Axivity (Sadeh 1 at the back and Cole-Kripke 1 at the wrist) algorithms tested did not produce large differences in waking frequency (Tables 4 and 5).

Bland-Altman
Figure 1 shows the Bland-Altman plots for agreement in SPT (a metric for sleep duration) and WASO (a metric for sleep quality) for the 'overall best performing algorithm' (HDCZA with the Axivity at the wrist), and the CS algorithm (which was the 'best performing' for the Axivity at the wrist for SPT).These plots illustrate that the CS algorithm performs better than the HDCZA for accurate assessment of SPT, with narrower 95% limits of agreement (LOA) (-165 to 172 min compared to -212 to 250 min for HDCZA).Both algorithms demonstrated similar performance for assessing WASO, with slightly lower 95% LOA for HDCZA (CS: -279 to 260 min; and HDCZA: -251 to 245 min) but both showed considerable inaccuracy in determining WASO at higher levels.

Discussion
Our study demonstrates that current count-based sleep algorithms show higher total accuracy and specificity when devices were placed at the wrist compared with other sites of wear, regardless of actigraphy brand or algorithm tested.Overall, the HDCZA algorithm demonstrated high levels of sensitivity, specificity and thus accuracy regardless of device brand or placement.In terms of the range of sleep outcomes studied, results were more variable and differed across outcomes of interest, algorithm and site of wear.Thus, researchers may choose a certain algorithm over another depending on their primary sleep outcome of interest; for example, studies of sleep timing may prefer the CS algorithm placed at the wrist, whereas studies more focussed on sleep quality may prefer the HDCZA algorithm.Poorer detection of wakefulness (poor specificity) by many of the algorithms and sites of wear continues to plague actigraphy estimates of both sleep and wake in paediatric studies [21] but specificity values are not always reported [22] despite the potential to influence data interpretation.This is also an issue in the adult field [23].
Several studies have assessed the agreement between research grade devices and PSG in healthy children, but many have been in small samples and utilised single sites of wear, devices or algorithms to detect sleep and wake states and derive sleep estimates [10,20,22].Most of the sleep detection algorithms used in the present study have been previously developed and validated against PSG in healthy adults [5,9,13], and only a few have been validated against PSG in children [7,10] albeit in small samples (n < 40).The findings from this much larger and more comprehensive study are broadly consistent with the original validation studies and a review of previous validation studies in children, which show that accuracy (0.84-0.92) and sensitivity (0.82-0.96) are generally good, whereas specificity (0.20-0.65) is considerably lower However it is clear from both previous research and the current study that the specificity (54-77%) [20], or ability to detect periods of wakefulness in the sleep period window, of most algorithms was better when the device was worn at the wrist, with estimates ranging from 67 to 90%.These figures are considerably higher than those observed in adult studies, which have reported specificities of 34-46% for the HDCZA, Sadeh and Cole algorithms when validated in adult samples [9,11,21].These discrepancies may arise because of differences in sleep characteristics between children and adults.In our study, most children had long periods of sleep without wakefulness during the night.Although immobility generally infers sleep in accelerometery-based assessment, immobility is possible during periods of wakefulness and as such can be mistakenly identified as sleep by actigraphy; it is likely this occurs more in adults because they have more periods of conscious nocturnal awakenings than children [11,19].Our Bland-Altman plots also revealed some bias between actigraphy-measured sleep period time and PSG, where larger differences were apparent Fig. 1 Bland-Altman plots for sleep period time (SPT) and wake after sleep onset (WASO) for the HDCZA and CS algorithms using the Axivity at the wrist compared to PSG.Red dashed lines indicate 95% limits of agreement as sleep period time decreased.More wakefulness and the shorter sleep times of adults likely contributes to the greater misclassification of WASO and thus poorer specificity overall compared with children.
The wrist placement was also superior to the thigh, lower back and hip for estimates of sleep onset, offset, quantity (TST and SPT) and WASO for most algorithms.Prior research has also indicated that hipworn accelerometers tend to overestimate total sleep time and sleep efficiency while underestimating wake after sleep onset (WASO), resulting in lower specificity compared to wrist-worn devices [21,24].This reduced specificity for hip-worn devices can be attributed to the algorithms predominantly designed for wrist-specific acceleration features, which are more attuned to nocturnal movements indicative of wakefulness.Devices positioned closer to the body's center of mass, such as the waist or lower back, are likely to register less movement during the night, potentially leading to overlooked periods of wakefulness.Differing feature selection (y-axis acceleration, inclinometer data, rolling-window size, changes in z-angle, etc.) may also explain why different algorithms outperformed others when devices were worn at the same site.Although we previously reported better estimates of sleep onset using the count-scaled algorithm when devices were worn at the hip [10], this was a much smaller study in younger children, and the very small differences observed (-3 min versus 2 min) may reflect device specific differences or alternatively agerelated differences in sleep settling habits.Only sleep efficiency (both definitions) was consistently superior when devices were worn at the hip.Because most algorithms overestimated sleep offset when worn at the hip (i.e.result in later waking), and underestimated WASO, sleep efficiency was thus higher.When determining the most optimal placement, device and algorithm to use, systematic variation should be an important aspect to consider.Systematic variation is more tractable than random variation because the direction of bias is known.In this study, the HDCZA, Sadeh 1, CS, and Cole-Kripke 1 algorithms performed well for estimates of sleep onset, offset, total sleep time and sleep period time, and importantly these estimates did not randomly vary when different devices or placements were used.Knowing that an algorithm, regardless of site placement or device type, always identifies sleep onset before PSG means that actigraphy identifies earlier sleep onset and thus overestimates total sleep time, and in turn, sleep efficiency.
Many current algorithms are disadvantaged by requiring sleep onset and offset times from diaries, which pose both respondent and analysis burden.Therefore, we specifically compared sleep estimates from three different algorithms (Sadeh 1, Cole-Kripke 2, Tudor-Locke 1) with PSG using diary recorded sleep onset and offset timings to guide the algorithm.Overall, the use of a sleep diary did not improve the level of agreement of sleep estimates between accelerometers and PSG.Although the children were asked about their sleep onset and waking times not long after awakening, it appears that estimating these timings by self-report is challenging, particularly estimating timing of sleep onset, and especially when more than one day of data are collected.These findings lend further support for using automated algorithms for detecting sleep and wake states, especially in large sample sizes.
Limitations of our study include that the accuracy in clinical populations or in children with any significant sleep disturbance is unknown, and it is not known whether these results would be similar in other age groups or those with irregular sleep patterns.Although we did not include a direct measure of sleep latency (an important sleep metric), "in-bed" time remained the same across site placement, device and algorithm, which suggests later sleep onsets would result in longer sleep latency.
The strengths of this study include the simultaneous comparison of two research-grade accelerometers worn at several sites (wrist, hip, thigh, lower back) with PSG, the rigorous reporting of actigraphy data according to recommendations for children [22], and the larger number of children included in this validation study than most previous studies [22].Importantly, sleep data were generated using 11 different automated sleep detection algorithms commonly reported in the literature, but not previously compared to PSG in a large sample of children and adolescents.While the comparison of accelerometers to the "gold-standard" PSG is a strength, it must be acknowledged that these two techniques do measure very different signals and actigraphy sleep scoring rules, particularly for WASO, are not entirely comparable to PSG.This likely explains the discrepancies, alongside the fact that actigraphy can wrongly infer sleep when children are lying awake and relatively motionless.This is particularly relevant as children settle to sleep but are still awake, and likely explains the earlier sleep onset detected by actigraphy.PSG detects sleep using changes in brain wave signals which can occur within a 30 s epoch.This rapid change may also explain the high frequencies of wakings detected by PSG, but not by actigraphy.
The differences between PSG and actigraphy methodology may also explain the large discrepancies between algorithms for estimates such as WASO and number of awakenings.Many of the algorithms define WASO as any transition between sleep and wake after sleep onset and before sleep offset, similar to PSG scoring.However, the CS algorithm aims to minimise artefactual movements detected during sleep by actigraphy and defines WASO as movements that occur over 5 continuous minutes of awake.This method of defining WASO means disagreements between PSG and actigraphy are considerably greater, but it is not clear if estimates of sleep used to demonstrate relationships with various aspects of health are affected by differences in how WASO is defined.To our knowledge, this has not been examined in the literature.Researchers may need to consider whether using a different gold standard measure of sleep, such as videosomnography, that measures similar constructs of sleep as actigraphy in future validation studies.Accurately discriminating between "awake" time and movement during sleep is important if the true relationships between sleep and health are of interest.Future studies where relationships between sleep estimates derived using different sleep algorithms and health should also be evaluated.Likewise, understanding what brand of accelerometer and site placement is best for accurate assessment of sleep may not necessarily align with the best choice for assessing other movement behaviours in the day (such as physical activity and sedentary behaviour).Researchers investigating 24 h movement behaviours will have to consider these results in the context of their objectives.

Conclusion
In conclusion, our study suggests that automated sleep detection algorithms applied to Actigraph and Axivity accelerometers, worn either at the lower back, hip or thigh, provide moderately comparable measures with PSG, but estimates of sleep outcomes including sleep quantity, sleep onset, sleep offset and WASO improve markedly when accelerometers are worn at the wrist.Accelerometry should be used cautiously in studies where estimates of sleep quality such as sleep efficiency and number of awakenings during sleep period are important or in samples of participants who experience frequent periods of wake after sleep onset.

Table 1
Scoring for each algorithm

wake epochs scored as 1 or 0 based on Parameter algorithm used
Uses 'average' estimated bedtime and waketime for population under study.To detect the bedtime sleep "event" the algorithm first moves 3 h forward to detect the first sleep onset event.If sleep is not detected in this 3 h it moves 2 h backwards to identify the last sleep onset event.If a sleep event is not detected within the 3 h after or 2 h before the chosen bedtime, the chosen bedtime (e.g.7:30 pm) is used.Sleep onset = first of 15 continuous minutes of sleep preceded by 5 min of awake.Sleep offset: last of 15 continuous minutes of sleep followed by 5 min of awake.Awakening: 5 continuous minutes of awake preceded and followed by 15 min of sleep.

Table 3
Sensitivity, specificity, and accuracy of epoch-by-epoch comparisons with PSG for sleep

Table 4
Comparison of PSG and Actigraph GT3x measured sleep outcomes using different algorithms and at each site

Table 5
Comparison of PSG and Axivity measured sleep outcomes using different algorithms and at each site

Table 5
(continued) Bolded differences refer to those whether actigraphy was not significantly different (P > 0.05) to PSG, and thus provide a good estimate for that sleep measure (sleep onset and offset), sleep quantity (sleep period time and total sleep time), and sleep quality (sleep efficiency, WASO, and number of night wakings). a